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The topology of an instant messaging system is described. Statistical measures of the network are 
given and compared with the statistics of a comparable random graph. The scale-free character of 
the network is examined and implications are given for the structure of social networks and instant 
messenger security. 
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I. INTRODUCTION 

The last few years has seen a large advance in our 
understanding of networks whose structures are by na- 
ture non-equilibrium and non-random. These networks 
have been used to study systems as diverse as the Inter- 
net router and WWW hyperlink networks, electric power 
grids, and cellular metabolic pathways @, §]■ In par- 
ticular, these networks prominently feature a power-law 
frequency distribution for the nodes' degree (scale-free 
network), a network diameter smaller than a compara- 
ble random graph, one with the same amount of nodes 
and the same average degree per node, and a much larger 
clustering coefficient than a comparable random graph. 

Among the most interesting of the networks studied 
are those that analyze human social interaction. The 
phenomenon of six-degrees of separation, first recognized 
by Stanley Milgram[p|, is well known and documented 
in both academic and popular culture. The most de- 
tailed of these networks studied are actor collaboration 
networks and scientific collaboration networks^, ||, |[ 0. 
Interestingly, the web of human sexual partners have also 
been documented ||. Typically these networks share the 
features mentioned above that differentiate them from 
random graphs and display a scale-free character. This 
paper hopes to add to these studies on social contact 
networks by adding another example: the connections of 
users in an instant messaging service. 



II. OVERVIEW 

Instant messaging has grown at a phenomenal rate in 
the last several years to become a major form of com- 
munication both over the internet and within company 
intranets. Instant messaging has become so important 
in fact that the FCC has attempted to force the largest 
server for instant messaging, AOL Time- Warner, to open 
its software for interoperability Ejj. 



'Electronic address: rds2u@alumni. Virginia. edr 



Instant messaging is distinguished from regular chat as 
being a one-on-one conversation between two users on an 
instant messaging network. Typically in instant messag- 
ing systems each user has a user name and a contact list 
containing the user names of other users who they often 
communicate with. It is this feature of instant messaging 
that makes it amenable to scientific study and statistical 
analysis. If one imagines each user as a node and each 
contact on ther user's contact list as an out-directed edge, 
the community on an instant messaging network can be 
modeled using graph theory. 

Using these assumptions it can be easily seen that an 
instant messaging network represents a non-equilibrium 
graph in that nodes (users) are added and removed over 
time and edges most likely accumulate on users in a non- 
random fashion. One possible model for this growth is 
the Barabasi- Albert (BA) model where edges are formed 
by preferential attachment. Those nodes with more edges 
are more likely to accumulate edges as time goes on. Sim- 
ilarly one could hypothesize a user is more likely to form 
out-directed edges (add users to the user's contact list) 
the more users are already present on their contact list 
and a user is more likely to receive in-directed edges (be- 
ing on another user's contact list) in a similar fashion. 

With these assumptions an instant messaging net- 
work's graph should probably display a scale-free charac- 
ter. To test this hypothesis an instant messaging network 
using the open-source Jabber protocol was researched to 
find such features. 



A. The Jabber Instant Messaging Protocol 

To clearly understand some of the assumptions and 
conclusions in this paper a cursory overview of the Jab- 
ber protocol is necessary jlQ]. The Jabber protocol is 
based off of XML and uses a distributed client-server 
architecture. Jabber was consciously based off of the 
architecture of email systems so instead of one central 
server like AOL's Instant Messenger or Microsoft's MSN 
Messenger, Jabber has many servers in many locations. 
Jabber clients are the users who communicate with in- 
stant messaging. Clients can communicate with all other 
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clients on their Jabber servers and with clients on other 
Jabber servers since the Jabber servers can communicate 
with each other. In addition Jabber supports additions 
called transports that allow Jabber clients to communi- 
cate with clients using other protocols such as those on 
AOL, Microsoft, ICQ, or Yahoo instant messaging. 



III. NETWORK STATISTICS 
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random 




directed 


undirected 
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8.2 


9.6 
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0.33 


1.9a;10~ 4 
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4.35 


4.1 


4.79 



TABLE I: Comparison of Network Statistical Measures 



The network studied was the instant messaging 
database from nioki.com, a French language teen- 
oriented web site. Appropriate measures were taken 
to completely preserve the anonymity and privacy of 
the users as is explained in detail in Appendix A. The 
nioki.com database contained 50,158 users (nodes) with 
almost 500,000 edges. Due to the model explained earlier, 
this instant messaging network was modeled as a directed 
graph. This is different from other social network studies 
such as the actor-movie collaboration network which was 
modeled as a bipartite graph and the scientific collabo- 
rations which were modeled as undirected graphs. 

The nioki.com instant messaging network was found 
to exhibit all the characteristics of a scale free network. 
The inward and outward directed edge frequency distri- 
butions both followed power laws with a ji n = 2.2 and 
"J out = 2.4. The average in and out degrees are (h n ) = 
9.1 and (k ou t) = 8.2. The average in and out degrees are 
identical in a network with no outside contacts. However, 
as explained in the description of Jabber, clients have the 
ability to communicate with clients on other servers out- 
side their current server. So there are probably contacts 
with clients that are not on nioki. corn's server. The data 
does not indicate who these contacts are but the differ- 
ence in the average in and out degrees per node intimate 
their existence. The diameter of the network, £ — 4.35 
so that there are about 4-5 users on average between any 
two users on the network. These values indicate the small 
world character of the nioki.com network as compared to 
a random graph (Table ||) . 

Since this network is modeled as a directed graph, it 
presents a rather asymmetric view of human social inter- 
action. In a directed graph it is possible for one user to 
"know" another without the other user reciprocally ex- 
pressing such a relationship. This is because the contact 
list data we have does not require a reciprocal relation- 
ship between two users. Measurements indicate about 
82% of the contacts in the network are in both directions. 
So on average 82% of the users on a given user's (user A) 
contact list also have the user A on their contact list. In 
order to get a clearer view and calculate the clustering 
coefficient, a new list of users and contacts was created 
adding those edges necessary to make the network undi- 
rected. In this case the power exponent of the degree 
distribution became 7 = 1.8. The average degree was 
(k) = 9.6 and the diameter of the network decreased to £ 
= 4.1. The average clustering coefficient was calculated 
at C = 0.33 further reinforcing the small- world character 



of the network. 

The final measures computed were the size of the giant 
weakly connecting connected component (GWCC) and 
the giant strongly connected component (GSCC). The 
GWCC is the number of nodes on the network that can 
be reached by any other node in the component ignor- 
ing the directions of edges. It was calculated at a very 
large 49,801 users or over 99% of the users on nioki.com. 
Only about 0.7% of the users were in disconnected com- 
ponents. The GSCC is a measure of the number of nodes 
in the component where any node can be reached from 
any other node through directed edges. This was calcu- 
lated to be 44,581 or 89% of nioki. corn's users. These very 
large values for the connected components are probably 
most likely explained by the structure of the nioki.com 
website itself. It is mostly made up of users who com- 
municate using the nioki instant messaging service with 
other users on nioki.com. It is unlikely that nioki. corn's 
instant messenger is used as a primary instant messaging 
tool for users on other servers or services by most users. 
Being a teen oriented website geared toward socializa- 
tion it likely has a tightly knit community over shared 
interests. This is unlike the actor or scientific collabo- 
ration databases where communication is mainly limited 
to professional roles and fields of research. 



IV. CALCULATIONS OF RANDOM GRAPH 
ATTRIBUTES 

The random graphs statistics in Table | were computed 
using the following equations which are theoretically ex- 
plained in detail in M. The comparable random graph 
was assumed to have the same number of nodes and av- 
erage degree per node as the undirected model of the 
instant messenger network. From this information the 
average clustering coefficient of the random graph can be 
calculated as 



C 



N 



(1) 



It is interesting to note that the clustering coefficient of a 
random graph is the same as the node connection prob- 
ability. The shortest path was estimated using the ap- 
proximation 



1 = 



ln(N) 



(2) 



3 




FIG. 1: Log-log degree distribution of out-directed edges in 
the nioki.com instant messenger network 




FIG. 2: Log- log degree distribution of in-directed edges in the 
nioki.com instant messenger network 




FIG. 3: Log-log degree distribution of constructed undirected 
nioki.com instant messenger network 



V. COMPARISONS WITH B ARAB ASI- ALBERT 
MODEL 

Earlier, the possiblity that this network grows accord- 
ing to the Barabasi- Albert model was considered. A key 
indication of this would be a measure of the preferen- 
tial attachment probability II(fc). Though the empirical 
data from the network and its scale-free character hint 
strongly towards the Barabasi-Albert model or a com- 



parable one, time dependent data was not available to 
allow the determination of the shape (linear of curved) 
or function of II(fc). 



VI. RELEVANCE TO SOCIAL NETWORKS 

A frequent question with the ever faster globalization 
and communication in the world is how connected we all 
really are. This research covered a relatively large sample 
of about 50,000 people and in some ways gives a glimpse 
into the connectivity of our society, but in other ways 
falls short. 

This research should give additional credence to the 
growing evidence of the scale-free nature of social and 
professional contacts in greater society. Combined with 
the earlier studies on professional collaborations it seems 
to indicate that society does exhibit a "small world" ef- 
fect. This research finds that the small world ofnioki.com 
is based on a scale-free topology, however, there are other 
models of small worlds including the Watts-Strogatz 
model which exhibit similar features. The small diam- 
eter compared to the number of nodes in the network 
indicates that the degrees of separation in nioki.com are 
a bit smaller than the six Stanley Milgram measured in 
his studies. However, there are caveats to the wholesale 
application of these results to larger society. Nioki.com is 
probably more connected than society at large due to its 
foundation of shared interests and demographics (young 
adults). The large size of the GWCC and GSCC is an 
indication of this. Thus, the researcher does not think 
it would be completely accurate to say everyone in the 
world is connected by 4-5 people. In fact in a recent crit- 
icism of Milgram's workpl[, it is asserted that studies 
that do not take into account the increased likelihood 
of connections based on factors such as demographics 
or professional affiliation may not clearly represent the 
larger society. Though I believe this research further em- 
phasizes that human social networks have a small world 
character, this research cannot address the question of 
the connectivity across varying demographics or social 
boundaries. 



VII. APPLICATIONS TO INSTANT 
MESSAGING SECURITY 

In instant messaging, like all computer network com- 
munications tools, security is often a paramount consid- 
eration. Though there are many problems of interest in 
the security of instant messaging from the privacy of con- 
versations to the security of user accounts and passwords 
the aspect of security most pertinent here is the spread 
of worms across instant messenger networks. There have 
been several recent outbreaks of worms on instant mes- 
saging networks [Q . The spread of epidemics on scale- 
free and small world networks has been well studied. It is 
known that there is no epidemic threshold for infinitely 
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large scale-free networks jl3|, [L4j and worms or viruses 
can spread rapidly through a network. Though there 
have not been any devastating worms so far it is wise to 
prepare to interdict the worst possibility. Let us assume 
a worm spreads through an instant messaging network 
by infecting a node and then spreading itself along all or 
some of the out-directed edges to new nodes which also 
may be infected. This is similar to recent worms which 
send a message containing an infected link to all mem- 
bers of a user's contact list. The dynamics of this kind 
of epidemic have been discussed |ll| [l4|, However, 
the options for stopping or slowing the epidemic are var- 
ied. You can alert users or provide a patch or software to 
prevent the spread as was done with the Code Red virus 
that infected Windows 2000 servers (l(|. With a more 
extreme event, however, more radical measures could be 
necessary. 

Unlike the Internet, where control is decentralized, the 
client-server nature of instant messaging makes more rad- 
ical measures possible. Research has indicated that scale- 
free networks though they are robust to random node 
failures, are very vulnerable to attack |l7], |l8|, ^9|. A 
directed attack at the most connected nodes could sev- 
erly damage a network such as the Internet. However, 
this characteristic could be reversed and turned to an 
advantage in halting an epidemic ju], M. In a severe 
instant messaging worm outbreak the server administra- 
tors could slow, but not completely stop, the spread of a 
worm in a way that does not affect the service for many 
users. By disabling the accounts of the most connected 
users on the network, they could effectively increase the 
network's diameter making the propagation of the epi- 
demic much slower and buying time for a patch or an- 
other curative measure. The diameter of the nioki.com 
network after a certain percentage of the most connected 
sites are removed is shown in Figure 4. Removing the 
top 10% connected users increases the diameter of the 
network almost twofold. However, even disabling up to 
10% of the most connected users would leave connectiv- 
ity for the other 90% of the network and allow the service 
to have more time to cope with the outbreak and help 
other users. There are caveats to this plan, however. 
This would only work assuming that the most connected 
users are online frequently enough to spread the worm. 
If many of the most connected users are rarely online, 
this strategy may not produce its full effect. Also, this 
strategy would still deny a segment of users usage to the 
network for an unspecified period of time and would pos- 
sibly upset many users if it is used too frequently. 



VIII. CONCLUSION 

In this paper the network structure of the instant 
messaging community of nioki.com was investigated and 
demonstrated to be a scale-free network. Though the 
preferential attachment was not determined, it is likely 
that the Barabasi- Albert or similar model describes its 



FIG. 4: Graph of changes in the diameter of the nioki.com 
directed network. Network diameter vs. percentage of most 
connected nodes removed 



evolution. Knowledge of this structure may tell us more 
about social networks in the real world and how to pre- 
vent the spread of worms on instant messaging networks. 
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APPENDIX A: NIOKI.COM USER PRIVACY 
CONSIDERATIONS 

Of the utmost importance was protecting the privacy 
of the users of the nioki.com instant messaging network 
that was researched. Here all privacy precautions taken 
are outlined and explained in detail. 



The data received by the reseacher of this paper was 
in the most raw form possible. The data from nioki.com 
was prepared so that all users and their contacts were 
anonymized as numbers. Users and their contacts were 
then matched up by matching a user number with a con- 
tact number. The researcher did not receive any user 
names, emails, IP addresses, geographical locations, per- 
sonal information or activity information, or any other 
data that could allow him to either determine the iden- 
tity of any given user or extrapolate anything about a 
user's activity on nioki.com or the Internet at large. It 
would have been impossible for the researcher to deter- 
mine any direct personal information or identity about 
anyone from this raw data. 

The data was only in the hands of the researcher at all 
times and was not given to any collaborators, published 
publicly or distributed in raw form, or sold for profit. The 
data was statistically analyzed in aggregate and therefore 
no information about any specific users could be extrap- 
olated. A rough metaphor of this experiment would be 
analyzing census data for a town. Aggregate patterns will 
emerge but no specific information about individual in- 
habitants can be gleamed from the data. In order to fur- 
ther protect privacy, this researcher cannot distribute the 
raw data to any others interested, even for research pur- 
poses. Such requests must be made directly to nioki.com. 

If there are any other questions or considerations re- 
garding the privacy of this research, please direct them 
to the email at the top of the paper. 



